Robust detection of phonetic features incritical bands
نویسندگان
چکیده
We consider how to detect phonetic features in noisy bandlimited speech. We propose an automatic method based on the hypothesis that independent feature detectors, working in parallel, account for the robustness of auditory strategies. Our method consists of three stages: rst, speech is ltered into critical bands and enhanced by nonlinearities; second, pho-netic cues are derived from narrowband measurements of periodicity and signal-to-noise ratio; third, signals from diierent bands are combined to make a global decision. These stages are formulated as components of a probabilistic graphical model, represented by a multilayer Bayesian network. The binary hidden variables in the model indicate phonetic cues in diierent parts of the frequency spectrum. We apply the model to detecting the phonetic feature +=?sonorant] that distinguishes vowels, nasals, and approximants from stops, fricatives, and aaricates. Implications for automatic speech recognition are discussed.
منابع مشابه
Speech activity detection fusing acoustic phonetic and energy features
With the wider deployment of automatic speech recognition (ASR) systems, the importance of robust speech activity detection has been elevated both as a means of reducing bandwidth in client/server ASR and for overall system stability from barge-in through the recognition process. In this paper we investigate a novel technique for speech activity detection, that we have found to be effective in ...
متن کاملLandmark detection for distinctive feature-based speech recognition
This work is a component of a proposed knowledge-based speech recognition system which uses landmarks to guide the search for distinctive features. In the speech signal, landmarks identify times when the acoustic manifestations of the linguistically motivated distinctive features are most salient. This paper describes an algorithm for automatically detecting acoustically abrupt landmarks. Some ...
متن کاملMulti-View Face Detection in Open Environments using Gabor Features and Neural Networks
Multi-view face detection in open environments is a challenging task, due to the wide variations in illumination, face appearances and occlusion. In this paper, a robust method for multi-view face detection in open environments, using a combination of Gabor features and neural networks, is presented. Firstly, the effect of changing the Gabor filter parameters (orientation, frequency, standard d...
متن کاملTowards Phonetically-Driven Hidden Markov Models: Can We Incorporate Phonetic Landmarks in HMM-Based ASR?
Automatic speech recognition mainly relies on hidden Markov models (HMM) which make little use of phonetic knowledge. As an alternative, landmark based recognizers rely mainly on precise phonetic knowledge and exploit distinctive features. We propose a theoretical framework to combine both approaches by introducing phonetic knowledge in a non stationary HMM decoder. To demonstrate the potential...
متن کاملPhonetic unit localization in a multi-expert recognition system
This paper describes an acoustic-to-phonetic decoder (APD) (based on a mixed strategy: a) bottom-up which hypothesizes the most robust information about the speech signal, b) top-down which makes some verifications about the acoustic features or about the macro-class localization on the speech signal. In this paper, only the bottom-up strategy is described. In our system, a phoneme is described...
متن کامل